Release 10.1A: OpenEdge Getting Started:
Database Essentials


Sizing your database areas

When trying to determine the size of an area, you must look at the makeup of the information being stored in that area. As stated before, an area can contain one or more tables or indexes. The default area should generally be reserved for schema and sequence definitions, as this will make conversions easier in the future. The first step in this process, if you already have an OpenEdge database, is to do a table analysis using the PROUTIL utility. See OpenEdge Data Management: Database Administration for details.

The following sample output shows a portion of a table analysis:

              ---Record---             ---Fragment---          ---Scatter--- 
Table        Records  Bytes      Min  Max     Mean   Count    Factor  Factor 
Work Orders  12,383   6,109,746  60   10,518  493    21,131   1.6     4.0 

After doing a table analysis, focus on record count (Records) and mean record size (Mean). Look at every table and split them according to mean record size. Each record contains approximately 20 bytes of record overhead that is not included in the calculation of the mean. These 20 bytes of overhead take into account the record and the RM block header overhead. For records per block calculations, you must add 20 to the mean record size.

Block sizes

In the vast majority of cases, choose an 8KB-block size on UNIX, and a 4KB-block size in Windows. Selecting these block sizes conforms to the operating system block size, which usually yield the best performance. It is good practice to match or be a multiple of the operating system block size, if possible. Matching the database block size to the file system block size will help prevent unneccesary I/O, as shown in Figure 4–1.

Figure 4–1: Matching database and file block sizes

In Windows, the operating system assumes that files and memory are handled in 4KB chunks. This means that all transfers from disk to memory are 4KB in size. The Windows operating system has been highly optimized for 4KB and performs worse at an 8KB setting in the majority of cases.

On UNIX operating systems, the block size is generally 8KB or a multiple of 8K. The block size is tunable. Generally, an 8K-block size is best on UNIX systems.

There are exceptions to every rule. The intention is to make a best estimate that will enhance performance and assist OpenEdge in meshing with your operating system.

Records per block

OpenEdge allows you specify a maximum number of records per block per area value within the range of 1 to 256. The number of records per block must be a power of 2 (1, 2, 4, 8..., 256). Depending on the actual size of your record, you may not be able fit the specified maximum number of records in a block.

To determine the number of records per block:

  1. Take the mean record size from your table analysis.
  2. Add 20 to the record size.
  3. For an 8K block size, divide 8192 by the number in Step 2. For a 4K block size, divide 4096 by the number in Step 2.
  4. Round the number from Step 3 up or down to the closest power of 2.

Most of the time, the record length will not divide into this number evenly so you must make a best estimate. If your estimate includes too many records per block, you run the risk of fragmentation (records spanning multiple blocks). If your estimate includes too few records per block, you waste space in the blocks. The goal is to be as accurate as possible without making your database structure too complex.

The following example demonstrates how to determine the best records per block seeting for a table in a database with 1 million records and a mean record size of 41 bytes:

Add the record overhead (approximately 20 bytes) to determine the number of the actual size of the stored record, as shown:

Mean record size (41) + record overhead (20) = actual storage size (61) 

Divide that number into your database block size to determine the optimal records per block, as shown:

Database block size (8192) / actual storage size (61) = optimal records per 
block (134) 

Here is your decision point. You must choose a power of 2 from 1 to 256 for the records per block. This leaves you with two choices: 128 and 256. If you choose 128, you will run out of record slots before you run out of space in the block. If you choose 256, you run the risk of record fragmentation. Make your choice according to the nature of the records. If the records grow dynamically, then you should choose the lower number (128) to avoid fragmentation. If the records are added, but not typically updated, and are static in size, you should choose the higher number (256) of records per block. Generally OpenEdge will not fragment a record on insert; most fragmentation happens on update. Records that are updated frequently are likely to increase in size.

Also, if you choose the lower value, you can determine this cost in terms of disk space. To do this, take the number of records in the table and divide by the number of records per block to determine the number of blocks that will be allocated for record storage, as shown:

Number of records (1,000,000) / records per block (128) = allocated blocks 
(7813) 

Next, calculate the number of unused bytes per block by multiplying the actual storage size of the record by the number of records per block and subtracting this number from the database block size, as shown:

Database block size (8192) – (Actual storage size (61) * records per block 
(128) = Unused space per block (384) 

Take the number of allocated blocks and multiply them by the unused space per block to determine the total unused space, as shown:

Allocated blocks (7813) * unused space per block (384) = total unused space 
(3000192) 

In this case, the total unused space that would result in choosing the lower records per blocks is less than 3MB. In terms of disk space, the cost is fairly low to virtually eliminate fragmentation. However, you should still choose the higher number for static records, as you will be able to fully populate your blocks and get more records per read into the buffer pool.

Unused slots

Another issue to consider when choosing to use 256 records per block is that you will not be using all the slots in the block. Since the number of records determines the number of blocks for an area, it might be important to have all of the entries used to obtain the maximum number of records for the table.

The following table shows the maximum number of blocks for an area by the records-per-block setting:

Records per block
Maximum number of blocks for area
4
536,870,911
64
33,554,432
256
8,338,607

Determining space to allocate per area

You must determine the quantity of space to allocate per area. OpenEdge is fairly good about keeping data and index storage at reasonable compaction levels. Most data areas are kept from 90 to 95 percent full, and indexes are generally maintained at 95 percent efficiency in the best case. It is generally advisable to use an 85 percent ratio. This is a reasonable ratio. Using the 1-million-record example previously discussed, you can see that the records plus overhead would take 61 million bytes of storage, as shown:

(41 bytes (Mean record size) + 20 bytes (overhead)) * 1,000,000 (Number of 
records) = 61 million bytes 

This is only actual record storage. Now, take this value and divide it by the expected fill ratio. The lower the ratio, the more conservative the estimate. For example:

61,000,000 /.85 = 71,764,706 bytes (total storage needed) 

To determine the size in blocks, divide this number by 1KB (1024 bytes). This step is necessary because the amount of space needed will be expressed in the structure description file (dbname.st) in kilobytes regardless of the block size of the database. For example:

71,764,706 / 1024 = 70083 (1KB blocks) 

If there are other objects to be stored with this table in a storage area, you should do the same calculations for each object and sum the individual calculations to determine the total amount of storage necessary. Above the current storage requirement, you need to factor storage for future growth requirements.

Distributing tables across storage areas

Now that you know how many records can fit in each block optimally, you can review how to distribute the tables across storage areas. Some of the more common reasons to split information across areas include:

Breaking a table out to its own area concerns how the table is populated and accessed. In those cases where records are added to a table in primary index order, most of the accesses to these records are done in sequential order via the primary index. There might be a performance benefit in isolating the table. If this is a large table, the performance benefit gained through isolation can be significant. There are two reasons for the performance improvement:

Finally, databases can contain different types of data in terms of performance requirements. Some data, such as inventory records, is accessed frequently, while other data, such as comments, is stored and read infrequently. By using storage areas you can place frequently accessed data on a “fast disk.” However, this approach does require knowledge of the application.

Using extents

Most DBAs choose powers of 2 for their extent sizes because these numbers are easier to monitor from the operating system level. Each area should have a variable extent as the last area to allow for growth. Monitoring and trending should keep you from needing this last extent, but it is preferable to have it available if you need it.

For the previous example, you can choose one 102,400KB extent to store the data, with room for expansion, and one variable extent; or you could choose eight fixed 10MB extents and one variable extent.

Extents allow you to distribute your data over multiple physical volumes if you do not use RAID on your system. For example, if you chose eight fixed 10MB extents and one variable extent for your area, you can “stripe” your extents across three drives, as shown in Figure 4–2. You put the first, fourth, and seventh extents on the first drive, the second, fifth, and eighth extents on the second drive, and the third, sixth, and variable extents on the third drive. OpenEdge fills these extents in order, so the first 10MB of data goes into the first extent, the second 10MB of data goes into the second extent, and so on. By striping the extents, you will have a mix of old and new data. While striping your extents is probably not as efficient as hardware striping, it does help you eliminate variance on your drives.

Figure 4–2: Manually striped extents

Even if you do have hardware striping, you might want to have multiple extents. The default file limit is 2GB per extent. If you want to store more than this amount, you need to have multiple extents per area, or you can enable large file support, which in theory allows you to allocate extents up to 16TB in size.

While it is possible to have one very large extent, this will not give you the best performance. The best size varies across operating systems, but 1GB seems to be a safe number across all operating systems with modern file systems. In Windows, you want to use NTFS file systems for best performance.

Index storage

While record storage is fairly easy to calculate, index storage is not. Index compression makes calculation difficult. The ever-evolving compression algorithms make the calculation even harder. In an effort to make things easier, you can look at a database analysis and use the information from that activity to make your decisions. Remember to add room for growth and general overhead, just like with data storage.

If you have an existing database, you can take statistics to determine index storage size. Without a database, you have to estimate the size. The number and nature of indexes can vary greatly between applications. Word indexes and indexes on character fields tend to use more space, while numeric indexes are significantly more efficient in terms of storage. There are databases where indexes use more storage than data, but these are the exception and not the rule.

In general, indexes account for approximately 30 percent of total storage. Therefore, you can take 50 percent of your data storage as an estimate of index storage. Remember that this percent might vary greatly, depending on your schema definition. Consider this estimate as a starting point and adjust and monitor accordingly.

The following example highlights a portion of a database analysis report that shows the proportion of data storage to index storage within an existing database. Use this information to determine the allocation of disk resources to the areas that are going to contain the data, as shown:

SUMMARY FOR AREA “Student Area”: 8 
---------------------------------- 
                   Records               Indexes               Combined 
Name           Size   Tot percent    Size   Tot percent   Size   Tot percent 
PUB.stuact       18.9M  12.6            9.7M   6.4            28.6M   19.0 
PUB.student      30.3M  20.1           20.1M  13.4            50.5M   33.5 
                ----------------------------------------------------------- 
Total           115.3M  76.4           35.6M  23.6           150.8M  100.0 

Primary recovery area

Proper sizing of the primary recovery area, also known as the before-image file, is important to your overall database system. This area is responsible for the recoverability of your database on an ongoing basis. The primary recovery area is written to frequently, and if it is on a slow disk your update performance will be affected. The size of this area varies depending on the length of transactions and the activity on your system.

The primary recovery area is made up of clusters, which are tunable in size. When records are modified, notes are written to this area. If a problem occurs or if the user decides to “undo” the changes, this area is be used to ensure that no partial updates occur.

For example, assume you want to modify all of the records in a table to increase a value by 10 percent. You would want this to happen in an all-or-nothing fashion because you could not determine which records were modified if the process terminated abnormally. In this case, you would have one large transaction that would modify all of the records. If a problem occurs during the transaction, all of the modifications would be rolled back to the original values. Why is this important? If you have several of these processes running simultaneously, the primary recovery area could grow quite large.

The structure of the area is a linked list of clusters. The smaller the cluster size the more frequent the checkpoints occur. A checkpoint is a synchronization point between memory and disk. While there is a potential performance benefit from infrequent checkpoints, this must be tempered with the amount of time it takes to recover the database. Large cluster sizes can also increase database startup and shutdown time when the database needs to back out incomplete transactions or perform crash recovery.

The best way to determine the before-image cluster size is to monitor the database at the time of the day when you make the most updates to the database and review the duration of your checkpoints throughout the week.

Ideally, checkpoints should happen no more than once every two minutes. If you are checkpointing more often, you should increase your before-image cluster size. This does not mean you should decrease the cluster size if it is happening less frequently. The default of 512KB is fine for smaller systems with low update volume, while a value of 1024KB to 4096KB is best for most other systems. The cluster size can be modified from small (8KB) to large (greater than 256 MB).

The cluster size influences the frequency of the checkpoints for the database. As users fill up a cluster with notes, they are also modifying shared memory. The page writers (APWs) are constantly scanning memory, looking for modified buffers to write to disk. At the first checkpoint, all of the modified buffers are put in a queue to be written prior to the next checkpoint. The buffers on the modified buffer queue are written by the page writers at a higher priority than other buffers. If all of the buffers on the queue are written prior to the next checkpoint, it is time to schedule the current modified buffers. This is the goal. If all of the buffers are not written, then you must write all of the previously scheduled buffers first and then schedule the currently modified buffers. If you are checkpointing at the proper frequency and you are still flushing buffers at checkpoint, you should add an additional APW and monitor further. If adding the APW helps but does not eliminate the problem, add one more. If adding the APW does not help, look for a bottleneck on the disks.

The format of the primary recovery area has been discussed, but not its size. There is no formula for determining the proper size because the size of the area is so dependent on the application. Progress Software Corporation recommends that you isolate this area from other portions of the database for performance reasons. If you only have one database, you can isolate this area to a single disk (mirrored pair), as the writes to this area are sequential and would benefit from being placed on a single nonstriped disk. If you have several databases, you might want to store your primary recovery areas on a stripe set (RAID 10) to increase throughput.


Copyright © 2005 Progress Software Corporation
www.progress.com
Voice: (781) 280-4000
Fax: (781) 280-4095